13 research outputs found

    Recent advances in LVCSR : A benchmark comparison of performances

    Get PDF
    Large Vocabulary Continuous Speech Recognition (LVCSR), which is characterized by a high variability of the speech, is the most challenging task in automatic speech recognition (ASR). Believing that the evaluation of ASR systems on relevant and common speech corpora is one of the key factors that help accelerating research, we present, in this paper, a benchmark comparison of the performances of the current state-of-the-art LVCSR systems over different speech recognition tasks. Furthermore, we put objectively into evidence the best performing technologies and the best accuracy achieved so far in each task. The benchmarks have shown that the Deep Neural Networks and Convolutional Neural Networks have proven their efficiency on several LVCSR tasks by outperforming the traditional Hidden Markov Models and Guaussian Mixture Models. They have also shown that despite the satisfying performances in some LVCSR tasks, the problem of large-vocabulary speech recognition is far from being solved in some others, where more research efforts are still needed

    The Role of Communication Technologies in Building Future Smart Cities

    Get PDF
    The world population is continuously growing and reached a significant evolution of the society, where the number of people living in cities surpassed the number of people in rural areas. This puts national and local governments under pressure because the limited resources, such as water, electricity, and transports, must thus be optimized to cover the needs of the citizens. Therefore, different tools, from sensors to processes, service, and artificial intelligence, are used to coordinate the usage of infrastructures and assets of the cities to build the so called smart cities. Different definitions and theoretical models of smart cities are given in literature. However, smart city can usually be modelled by a layered architecture, where communication and networking layer plays a central role. In fact, smart city applications lay on collecting field data from different infrastructures and assets, processing these data, taking some intelligent control actions, and sharing information in a secure way. Thus, a two way reliable communications layer is the basis of smart cities. This chapter introduces the basic concepts of this field and focuses on the role of communication technologies in smart cities. Potential technologies for smart cities are discussed, especially the recent wireless technologies adapted to smart city requirements

    Using data-driven and phonetic units for speaker verification

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. A. E. Hannani, D. T. Toledano, D. Petrovska-Delacrétaz, A. Montero-Asenjo, J. Hennebert, "Using Data-driven and Phonetic Units for Speaker Verification" in Odyssey: The Speaker and Language Recognition Workshop, San Juan (Puerto Rico), 2006, pp.1 - 6Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approache

    Using data-driven and phonetic units for speaker verication

    Get PDF
    Abstract Recognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approaches

    Real-Time ASR from Meetings

    Get PDF
    The AMI(DA) system is a meeting room speech recognition system that has been developed and evaluated in the context of the NIST Rich Text (RT) evaluations. Recently, the ``Distant Access'' requirements of the AMIDA project have necessitated that the system operate in real-time. Another more difficult requirement is that the system fit into a live meeting transcription scenario. We describe an infrastructure that has allowed the AMI(DA) system to evolve into one that fulfils these extra requirements. We emphasise the components that address the live and real-time aspects

    The AMIDA 2009 Meeting Transcription System

    Get PDF
    We present the AMIDA 2009 system for participation in the NIST RT’2009 STT evaluations. Systems for close-talking, far field and speaker attributed STT conditions are described. Im- provements to our previous systems are: segmentation and diar- isation; stacked bottle-neck posterior feature extraction; fMPE training of acoustic models; adaptation on complete meetings; improvements to WFST decoding; automatic optimisation of decoders and system graphs. Overall these changes gave a 6- 13% relative reduction in word error rate while at the same time reducing the real-time factor by a factor of five and using con- siderably less data for acoustic model training

    System-independent ASR error detection and classification using Recurrent Neural Network

    Get PDF
    This paper addresses errors in continuous Automatic Speech Recognition (ASR) in two stages: error detection and error type classification. Unlike the majority of research in this field, we propose to handle the recognition errors independently from the ASR decoder. We first establish an effective set of generic features derived exclusively from the recognizer output to compensate for the absence of ASR decoder information. Then, we apply a variant Recurrent Neural Network (V-RNN) based models for error detection and error type classification. Such model learn additional information to the recognized word classification using label dependency. As a result, experiments on Multi-Genre Broadcast Media corpus have shown that the proposed generic features setup leads to achieve competitive performances, compared to state of the art systems in both tasks. Furthermore, we have shown that V-RNN trained on the proposed feature set appear to be an effective classifier for the ASR error detection with an Accuracy of 85.43%

    Application d’une approche inspirée des colonies de fourmis pour la recommandation des chemins d’apprentissage dans un cours en ligne : modèle et expérience

    No full text
    Dans cet article, nous présentons la mise en œuvre, l’expérimentation et l’évaluation d’une approche pour la recommandation des chemins d’apprentissage dans un cours en ligne. Le processus de recommandation est inspiré de l’intelligence en essaim et plus particulièrement de l’optimisation par colonies de fourmis (OCF) (ant colony optimization [ACO]). Dans ce contexte, nous avons considéré une différenciation des chemins d’apprentissage en fonction de l’activité explorée pour l’apprentissage d’un cours. Dans l’objectif de recommander des chemins d’apprentissage considérés optimaux et d’évaluer ainsi leur impact sur l’apprentissage d’un cours en ligne, l’approche proposée est basée à la fois sur la recommandation de chemins pertinents par l’enseignant et sur les résultats stockés au fur et à mesure par les apprenants sur les chemins empruntés. Notre approche a été validée expérimentalement et les résultats obtenus ont montré l’émergence d’un chemin d’apprentissage favorisant la réussite d’un nombre d’apprenants relativement considérable

    Real-time ASR from meetings

    No full text
    The AMI(DA) system is a meeting room speech recognition system that has been developed and evaluated in the context of the NIST Rich Text (RT) evaluations. Recently, the “Distant Access ” requirements of the AMIDA project have necessitated that the system operate in real-time. Another more difficult requirement is that the system fit into a live meeting transcription scenario. We describe an infrastructure that has allowed the AMI(DA) system to evolve into one that fulfils these extra requirements. We emphasise the components that address the live and real-time aspects. Index Terms: real-time speech recognition, meeting ASR, beam-forming, speech meta-data. 1
    corecore